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STATISTICS OF EXTREMES BY ORACLE ESTIMATION 

By Ion Grama and Vladimir Spokoiny 

University of South Brittany and Weierstrass- Institute 

We use the fitted Pareto law to construct an accompanying ap- 
proximation of tfie excess distribution function. A selection rule of 
the location of the excess distribution function is proposed based on 
a stagewise lack-of-fit testing procedure. Our main result is an oracle 
type inequality for the KuUback-Leibler loss. 

1. Background and outline of main results. Let Xi, . . . ,Xn, be i.i.d. ob- 
servations with continuous d.f. F supported on the interval [a;o,oo), xq > 0. 
Assume that d.f. F is "heavy tailed," that is, that F belongs to the domain 
of attraction of the Frechet law ^i/^{x) = exp{—x~^^"'), x >0, with param- 
eter 1/7. By Fisher-Trippet-Gnedenko theorem (see Bingham, Goldie and 
Teugels [2]) this is equivalent to saying that for any x > 1, 

(1.1) Ft{x)^P^{x) ast^oo, 

where Ft{x) is the excess d.f. over the threshold t>xo defined by 

, , l-F(xt) 
Ft(x) = l Vv^, x>l, 

and Pe{x) = 1 — x~^/^ ^ 2: > 1, is the standard Pareto d.f. with parameter > 
0. Relation (1.1) suggests using P^yix) with estimated 7 as an approximation 
of Ft(x) for a given x and large t. However, it can be misleading in cases 
when the convergence to the limit distribution is too slow. This is easily seen 
by inspecting the trajectories of the Hill estimator (1.2) computed from 
samples drawn from the log-gamma distribution F{x), see Figure 1. It is 
sometimes called the Hill horror plot, because of the important discrepancy 
between the Hill estimator and the estimated parameter 7, even for very 
large sample sizes (see Embrechts, Kliippelberg and Mikosch [7] or Resnick 
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Fig. 1. 1 — The Hill estimator hn,k,k = 1, . . . , n, for log-gamma d.f. with rate parameter 
1 and shape parameter 2. 2 — Index of regular variation 7 = 1 which is expected to be esti- 
mated. 3 — The fitted Pareto parameter 6t (F) computed from the approximation formulas 
(4.4), (4.8). Left: 1 realization; Right: 100 realizations. 



[20]). The explanation lies in the fact that the Hill estimator merely fits a 
Pareto distribution to the data thereby providing an approximation of the 
excess d.f. Ft rather than for 7 itself. Despite these evidences the problem 
of estimating the excess d.f. Ft regardless of the limit is less studied in 
the literature. 

The goal of the present paper is twofold. First of all, we shall consider the 
problem of recovering the excess d.f. Ft from the data Xi, . . . ,X„ directly, 
and second, we shall propose an adaptive procedure of the choice of the 
location of the tail t. Motivated by (1.1), we assume that for large values of 
t>xo the excess d.f. Ft can be approximated by a Pareto law Pg^ with some 
index 6t > possibly depending on the location t and generally different 
from 7. The statistical problem is that of recovering Ft by constructing a 
family of estimators 9n,t,t > xq, of the parameters 9t, t>XQ, and proposing 
an adaptive rule for choosing the location threshold t. 

Some consequences of the main results of the paper are formulated below. 
Let Xn,i > Xn,2 > • • • > Xn^n be the order statistics pertaining to Xi,. . . , X^ 
and /in,fc, k = 1, . . . ,n — 1, be the family of Hill estimators, where 

^ 1 ..^ X 
(1-2) hn,k = Y^log—^, 

k ~{ Xn,k+l 

see Hih [14]. Denote nt = Ya=i ^{Xn,i > t) and 9n,t = ^^1^^, where dn,t = if 
nt = 0. 

The discrepancy between two equivalent probability laws P and Q is mea- 
sured by the Kullback-Leibler divergence /C(P, Q) = j log g dP and by the 

X^-divergence x^{P^ Q) ~ I ^ ~ ^- t > xq the best approxima- 

tion of the excess d.f. Ft is defined by looking for the "closest" element in 
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the set of Pareto distributions. Let Ot{F) = argmin5i>o/C(Ft, Pg) be the mini- 
mum Kullback-Leibler divergence Pareto parameter, cahed in the sequel for 
short fitted Pareto index. Thereafter Y'p denotes the probabihty measure 
corresponding to the i.i.d. observations Xi, . . . with d.f. F. 

Instead of (1.1), assume that F admits an accompanying Pareto tail, 
which means that ^^(-Ptj -P6»t(F)) — > as t — > oo. This condition is not very 
restrictive and defines a class of d.f.'s related to those in Hall and Welsh [12] 
and Drees [6]. Then, according to our Theorem 4.4, 

HFr^^Pg ) = Ovp{ ,^ asn^oo, 

^■^■.r-n \n{l- F{Tn))J 

for any sequence {r„} obeying 

(1.3) x'(F,.,P,,.,„) = o(^;^^-|^)^0 asn^oo. 

The sequence {r„} in the definition of the estimator 9n,T„ being, generally, 
unknown, we give an automatic selection rule kn (Section 3) such that the 
adaptive estimator 6n = h r mimics the nonadaptive estimator 9^ r„ , that 
is, 

for any sequence of locations {t„} obeying (1.3), see Theorem 4.10. From 
the results in Hall and Welsh [12] and Drees [6] it follows that the estimators 
6n,T„ and On attain optimal or suboptimal rate in some classes of functions 
(see Section 5 for details). 

Many results on the adaptive choice of the number k of upper statistics 
involved in the estimation require prior knowledge on the unknown d.f. F. 
A peculiarity of the adaptive procedure proposed in the paper is that it 
applies to an arbitrary d.f. with Pareto like tail and does not ask additional 
information on its structure. In particular, F need not even be regularly 
varying at infinity, that is, it need not satisfy (1.1). 

The brief outline of paper: In Section 2 we construct the local likelihood 
estimators. Section 3 introduces the adaptive procedure for selecting the 
threshold t. Main results of the paper are presented in Section 4. Examples 
of computing the optimal rates of convergence are given in Section 5. In 
Sections 7 and 8 we prove exponential type bounds for the likelihood ratio 
used in the proofs of our main results and necessary auxiliary statements. 
We shall illustrate the performance of our results on some artificial data sets 
in Section 6. 
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2. Construction of the estimators. Let T be the set of all d.f. F having 
support on the interval [xq, oo) with xq > 0, and admitting a strictly positive 
density fp w.r.t. Lebesgue measure. For any t > xq define the excess d.f. over 
the threshold t as 

/ , , , 1 - F(tx) 

(2.1) Ft{x) = l- ^_-p,^^y ^>1- 

It is easy to see that 

Ft{x) = l-e^p(- r^^), 
V Jt uapyu)/ 

where 



(2.2) aF{u) = u>xq, 

uXf[u) 

and \f{u) = /fpil) ? ^ > a^O) is the hazard rate. admits the density 

(2.3) /^^(x) = -^^^ = ^exp-/ — , x>l. 

Note that, according to the von Mises theorem, if there exists a constant 
a > such that ai?(j;) — > a as x ^ oo, then F is regularly varying with the 
index of regular variation a, see Beirlant et al. [1]. 

Recall that given X^^k+i = t the observations Xn^i/t, . . . ,X„ ^/t are the 
order statistics of an i.i.d. sequence with common density Jf^ (see Reiss 
[19]). Motivated by this we define the local log-likelihood function 

(2.4) K,til')= E ^og fF,{Xi/t). 

i:Xi>t 

Let 1C{6',9) = lC{Pgi,Pg) be the Kullback-Leibler divergence between Pg 
and Pqi, 

(2.5) ic{e',e) = Jiog^dPe> = Gl^j-i^, e',e>o, 

where G{x) = x — log(l + x). We extend this definition by setting IC{9', 6) = 
oo if at least one of 0' = or = holds. Lemma 8.1 implies 

m,^2)^<f^-iy as ^-1^0. 
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2.1. Pareto-type tails. Let J^t be the set of functions F £ satisfying 
apix) = 9, for x £ {t, oo), where 9 > and t > xq. If F G then the d.f. Ft 
is exactly Pareto d.f. Pg. Maximization of the local log-likelihood (2.4) over 
J-t gives the maximum local quasi-likelihood estimator 

(2.6) ^„, = 1 Y: log^, 

where fit = ^27=1 l(^i > t) denotes the number of observations in the interval 
(t, oo). Here and in the sequel the indeterminacy 0/0 arising in the definition 
of the estimators is understood as 0, that is, for t > X^^i the estimator 9n,t is 
defined to be 0. Although 0„ ^ is not exactly the Hill estimator, it is closely 
related. In fact, if t = Xn^k+i, where 1 < k < n — 1, then 9n^t = 9n,x„ k+i 
coincides with the Hill estimator /in,fc, see (1.2). 

Let Ln,t{9', 9) = Ln,t{Pe') — Ln,t{Pe) be the log of the local likelihood ratio 
of Pqi w.r.t. Pq. By elementary calculation one can see that 

(2.7) Ln,t{9n,uO)=nMen,t.9). 



2.2. Pareto change point-type tails. Let J-t^r be the set of functions F G 
!F having the change point structure: apix) = ^i, for x G [t,T), apix) = ^2, 
for x G [t, 00), where ^1,^2 > and 1 < t < r < 00. Of course J^t C Tt^r- If 
F G J-t,T^ then the d.f. Ft coincides with the Pareto change point d.f. 

/ t-x du \ 
P,,,,,,./,(x) = l-exp^ -^j, 

where a'{x) = 61, for x G [l,T/t), a'{x) = 92, for x G [r/t, 00). For given 
t < ^n,i and T > 1 maximization of the local likelihood (2.4) over J^t,T gives 
the mcLxiniuni likeliliood cstinicitor (^O^i^i 'j-^Ofi^j- ), where 

nt9n,t — nrOn^r 

nt,T 

and rit^T = nt — 'nr = J27=i ^{t < Xi < t) is the number of observations in the 
interval (t, r]. As above, 0n,t,r = if t > Xn,i. 

Denote by Ln^t{9i,92,T,9) = Ln,t{Pei,e2,T/t) - Ln,t{Pe) the local log- 
likelihood ratio corresponding to Pareto change point model Pei,e2,T/t{x) 
with respect to the Pareto model Pg. By straightforward calculations it is 
verified that 

(2.8) Ln^t{9n,t,T,0n,T,T , 9) = nt,TK-{9n,t,T, 9) + nr}C{9n,T, 9). 



'n,t,T 
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3. Adaptive selection of the location of the tail. Several procedures have 
been proposed in the literature for the choice of the number of upper statis- 
tics to be used in the estimation of the index of regular variation. We refer to 
Beirlant et al. [1], and to the references therein [3, 5, 8, 11, 13, 15]. However 
one should note that most of these procedures require some prior knowledge 
on the d.f. F. 

To illustrate the problem let us recall the main result in Hall and Welsh 
[12] (see also Drees [6]). Let F be a d.f. with density 

(3.1) /ir(x) =da2;-("+i)(l + r(x)), |r(rE)| < ylx""^, rr > 0, 

where \a — ao| < e, M — c?o| < ^ and dQ,£,p^A > 0. It is proved that the 
optimal rate of convergence that can be achieved for estimating a = 1//3 is 
j^-p/(2p+i)_ This optimal rate is attained for the Hill estimator hn^k„ with 
the choice A:„ ~ n'^P/('^P+^) depending on p. An adaptive estimator can be 
constructed by estimating p and implementing this estimate into the optimal 
kn- This approach requires us to know in advance the class of distributions 
F, or generally this information is not available in practice. It is also too 
conservative in the sense that it is oriented to the worst case in the given 
class but it may happen that particular distributions have nicer properties. 

In this paper we will give a selection procedure which is distribution free 
and attains exactly or nearly optimal rates for each particular law F in 
contrast to minimax estimation which is oriented to the worst case in a 
given class of functions. These kinds of results are usually related to the 
so-called oracle inequalities (Donoho and Jonstone [4]). 

The selection rule of the location of the tail r which we propose is based on 
the stagewise lack-of-fit testing for the Pareto distribution (see also Grama 
and Spokoiny [9]). It can be compared with the adaptive procedures for 
selecting the bandwidth in nonparametric pointwise function estimation, see 
Lepski [16], Lepski and Spokoiny [17]. Drees and Kaufmann [5] give a variant 
of the latter adapted to the tail index estimation. A stagewise procedure for 
testing Pareto d.f. has been proposed and its performance analyzed in Hall 
and Welsh [13], where it was shown that the choice based on the detection 
of lack-of-fit point introduces a significant bias. Our procedure differs from 
these approaches since the point of lack-of-fit serves just as a pilot for the 
choice of k. 

3.1. The lack-of-fit test. Denote by [a] the integer part of a. Assume that 
the sequence of positive integers {Kn} satisfies Kn < n and limn^oo Kn = oo. 
Consider the uniform grid rj = ri(n) = [in/Kn], i = 1, . . . , Kn- In particular, 
if Kn = n we have = i, for i = 1, . . . , n. Let ko be a positive integer much 
smaller than n. 
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We shall choose the location of the tail of F in the random set {Xn^n '■ 
i = ko, . . . , Kn} and therefore the problem reduces to the choice of the nat- 
ural number r^. We shall proceed by local change-point detection, which 
consists in consecutive testing for the hypothesis that conditionally 

on Xn,rm+i = ^ the observations Xn^i/s, . . . ,Xn,rm/^ order statistics 

of an i.i.d. sample with a Pareto d.f. against the alternative that 
conditionally on Xn,rm+i = the observations Xn,i/s, . . . , Xn^r^n/^ the 
order statistics of a i.i.d. sample with a Pareto change-point d.f. Peifi2,r/s^ 
for all m = rkQ,...,rK,,. 

For testing H2, ^ against H}, ^ we shall make use of the likelihood ratio 

^ I I'll m ^ ii'i' m 

statistic Tn{t,T) which is defined by 

(3.2) T„(t,T)= sup Ln,t{F) - sup Ln,t{F) = Ln,t{dn.t,T,dn,T,T,0n,t), 

for XQ<t<T. Taking into account (2.8) one gets 

(3.3) Tn{t,T)=T^^\t,T)+T^^\t,T), t<T, 

where 

T^^\t,T)=nt,rlC{en,t,r,en,t), T^^) (t, r) = n,/C(0„,,, 
For each m and k <m consider the test statistics 

(3.4) Tn,m= Tn^m,k, '^n,m,k = T^ll^j^ + T^l^j^, 

pm<k<(l—d)m ' 

where 

'^n}n,k~'^n\-^n,m,Xn,k), ^=1,2, 

and p and S are constants satisfying < p, 5 < | . We shall suppose that S is 
so large that (1 — 6)ri < rj_i, for all i = ko, . . . ,Kn- Actually this condition 
is satisfied for any given 6 > when n becomes sufficiently large. We shall 
also assume that pr^^ > ri. 

The hypothesis //^ will be rejected if Tn^Vm > In-, for some critical value 
In = /wlogn, where /u is a positive constant. 

3.2. The adaptive procedure. At this stage the required parameters are 
the number of the points on the grid Kn, the starting point /cq, two numbers 
p and 5 which determine the size of the testing window and the critical value 

in- ^ 

The procedure of the adaptive choice of the value kn reads as follows: 
Initialize Set i = ko. 

Step 1 Compute the test statistic Tn^n by (3.4). 
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Step 2 If i < Kn and Tn.n < increase z by 1 and repeat the procedure 
from Step 1. If i < Kn and Tn^n > In-, define 

(3.5) kn = arg max T^], , 

and exit the procedure. If i > Kn we define kn = n and exit the procedure. 
The described procedure is equivalent to defining the adaptive value 

(3.6) fen = arg ^ max ^ T^Jl ^, 

pmn<k<{l—S)mn ' 

where 

(3.7) fhn = min{rj : Tn,r, > 3n, i = ko,..., Kn}, 

with the convention min = rK„ ■ The adaptive location of the tail r is then 
defined by f„ = X and the adaptive estimator is set to 

On = h -^ = . 

Remark 3.1. In the case of Pareto observations the test statistics (3.3) 
and (3.4) do not depend on the parameter of the Pareto law. This suggests 
to compute the critical values in by Monte Carlo simulations from the ho- 
mogeneous model with i.i.d. standard Pareto observations. Our simulations 
show that the proposed adaptive procedure is sensitive to some extent to p, 
while being less sensitive to 6, k^ and Kn- The choice of these parameters 
is discussed in Section 6. The reason of introducing the parameter Kn is to 
speed up numerical execution of the adaptive choice. In order to simplify the 
formulations and the proofs of the results, in the sequel we shall consider 
only the case Kn = n, which means that r will be chosen among all order 
statistics Xn,i, - - ■ , Xn,n- 

4. Main results. Recall that nt is the number of observations in the 
interval (t, oo). Let nt = n{l — F{t)) be the expected number of observations 
in the same interval. Note that 9n,t = ^nnt' * — ^o, by (1.2) and (2.6). 

Thereafter Pp and E^? denote the probability and the expectation per- 
taining to the i.i.d. observations Xi, . . . ,Xn with common d.f. F. For any 
equivalent probability measures P and Q we denoted by }C{P, Q) = Ep log ^ 

the Kullback-Leibler divergence and by -x^{P,D) = J ^dP — 1 the x^- 
divergence. A simple application of Jensen's inequality shows that < 

/C(P,Q)<log(l + x'(^',Q)). 

We shall measure the discrepancy between two possible values > 
and ^2 > of the Pareto index in terms of the Kullback-Leibler divergence 
K-{0\,92) between two Pareto measures, see (2.5). 
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4.1. Rates of convergence of nonadaptive estimators. We say that the 
d.f. F admits an accompanying Pareto tail with tail index function 9t,t>XQ, 
if for any t > xq there exists an index Of > such that 6t is a continuous 
function of t and 

(4.1) l[mx\Ft,Pe,) = 0. 

t — ^oo 

This definition can be viewed as an extension of the regular variation con- 
dition (1.1). Instead of requiring the existence of the hmit it stipulates 
that Ft admits an accompanying Pareto law Pq^ with a parameter 9t> ^ 
possibly changing with t. The class of d.f. satisfying (4.1) is very large. For 
instance the d.f.'s satisfying the Hall condition (3.1), log-gamma d.f. and 
Pareto d.f. with logarithmic-type perturbations are of this type. We refer to 
Section 5, where 9t is explicitly computed for these examples. The class of 
distributions defined by (4.1) includes d.f.'s which are not regularly varying. 
Examples are normal and exponential d.f.'s with some ^ as t ^ oo. 

It is easy to see that if the d.f. F admits an accompanying Pareto tail 
with tail index function 9t, t > xq, then there exists a sequence {tu} such 
that 

(4.2) ,^(F.„P„„.) = o(-^J5i^)^0 ..n^oo. 

For the sake of brevity, a sequence of locations {r„} satisfying (4.2) is said 
to be admissible. 

Theorem 4.1. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function 9t, t>XQ. Then, for any admissible sequence of 
locations {t„}, 

^ (^n,r„ ,9rJ = ( , ^ ^"^J! . . ) as n ^ OO. 
Vn(l - F(Tn))) 

Here and in the sequel the constant in Op^ depends only on the constant 
in O in (4.2). This theorem is an immediate consequence of the more general 
Theorem 4.5 formulated below. 

Corollary 4.2. Assume that F admits an accompanying Pareto tail 
with a constant tail index function 9t = j, t> xq. Then by Theorem 4.1, 

(4.3) ^(^V..7) = 0p,( „(,'_°y(^^), 

with Tn satisfying (4.2). 
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For any t > xq we "project" Ft on the set V by choosing the closest 
element to Ft in the set of Pareto d.f.'s V = {Pe : > 0}, say Pg^i^p-^, where 

^t(F) = argmin/C(Ft,Pe). 

The parameter 9t{F) will be called in the sequel fitted Pareto index. It can 
be easily computed and has the following explicit expression (see Figure 2 
for a graphical representation): 

/■°°, X Fidx) 



(4.4) 



tiF) 



log xFt{dx) 



t > Xq. 



II Jt "tl-Fit)' 

In cases when (4.1) holds and F is regularly varying at oo with index of 
regular variation 7, it is easy to verify that Ot{F) ^7 as t ^ 00. 

Corollary 4.3. Assume that F admits an accompanying Pareto tail 
with tail index function 9t = 0t{F), t > xq. Then according to Theorem 4-1 

logn 



(4.5) ic{en,r.^,erAF)) = Op, 

where Tn satisfies (4-2). 



n(l-F(r„)) 



as n - 



00, 



Corollaries 4.2 and 4.3 can be compared with the consistency results for 
the Hill estimator established by Mason [18] (see also Hall [10]). Recall the 



Fitted Pareto parameter 




Fig. 2. Fitted Pareto index 6t{F): 1— Pareto d.f.; 2 — log-perturbed Pareto d.f. (5.3); 
3 — log-gamma d.f.; 4 — C'auchy d.f.; 5, 6, 7 — Hall model (5.4). 
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main result of [18]. If F is regularly varying with index of regular variation 
7 and /c„ satisfies oo and /c„/n— >0 as n— >oo, then 

(4.6) /in,fc„-^7 asn^oo. 

Our Corollary 4.3 improves upon this result by stating that if F admits an 
accompanying Pareto tail with tail index function 6t = Ot{F), t>XQ, then 
for any t„ satisfying (4.2), 

(4.7) - Or^ (F) = en,r^ - ^,JF) ^ as n ^ OO. 

A comparison of the precision of the approximations (4.6) and (4.7) is given 
in Figure 1, where the realizations of the estimator /i„ ^ are plotted as pro- 
cesses in k along with the fitted Pareto index 9t{F), for t = Xn,i, ■ ■ ■ ,Xn,n- 
The underlying d.f. F{x) is the log-gamma one. From these graphs it is 
seen that for finite sample sizes the Hill estimator hn,k provides a satisfac- 
tory approximation of the quantity 6x„ j. (F) while staying far away from 
the solid straight line corresponding to the parameter of regular variation 
7 = 1, except the cases when the fitted Pareto index itself is close to 7. These 
conclusions are confirmed also by simulation results reported in Figure 3. 

Note that the fitted Pareto index 6t{F) coincides with the mean value of 
the function ap [see (2.2)] on the interval [t, 00) w.r.t. Ft. 



/oo rco 
aF{tx)Ft{dx) = j apix 



F{dx) 



For numerical computations of the value Ot{F) one can use the following 
approximation formula: 

k 

(4.8) ex^,,{F)^\Y.^F{Xn,i). 

Now we shall present an application of the bound (4.5) to the estimation 
of the excess d.f. F^^ . 

Theorem 4.4. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function 9t = OtiF), t > xq. Then, for any admissible 
sequence of locations {r„}, 

K^{Frn,PQ ) = Opp( ^"^J) ) as 00. 

Proof. For any 9 > and any s > xq, 

(4.9) lC{Fs,Pe) = /C(F„ P,^(^)) + }C{9s{F),9). 
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Caiichy d.r. P^to cJiafige-pokiE d.f. 




1 




401 HOD 



BOO MUX 



1 , 

s - 




Fig. 3. 1 — 100 realizations of the Hill estimator hn,k,k = l,...,n, for Cauchy 
d.f. (top left), Pareto change-point d.f. (top right), Hall model (5.4) (bottom left 
a — 1, f3 — 3,c = 1.8; bottom right q = l,/3 = 1.2, c — 1.8); 2 — Index of regular variation 
7=1 which is expected to be estimated. 3 — The fitted Pareto index 9t(F) computed from 

(14), (4-8). 



The identity (4.9) follows immediately from the decomposition 

/oo dPo I r\ 

log^P^Fs 

and from 

/■oo dPa (p\ 

/ log^^dFs = IC{esiF),9). 

J I dPg 

Using (4.9), one gets 

IC{Fr„ , P-^^^^ ) = IC{Fr„ , Pe^^^F)) + ICi9r„ {F)Jn,r„). 

Since by Lemma 8.1, JC{9i,62) < jJC{92,0i), the assertion follows from the 
convergence result (4.5) and from the inequality 



EXTREMES BY ORACLE ESTIMATION 



13 



as n — > oo. □ 

The previous results are based on the fohowing more general bound which 
is a simple application of an exponential bound for the maximum of the 
likelihood ratio. 

Theorem 4.5. Assume that {t„} is a sequence such that Tn > xq and 
linin^oo"-!! — F{Tn)) = OO. Then for any sequence {9n} of positive numbers 
it holds 

/C(6'n,r„,6'n) =OPf. —T. ^/ n n + (^r„ , ) OS n ^ OO, 

V 71.(1 - F(r„)) J 
with an absolute constant in Op p. 

Proof. Letting t = s = Tn, 9 = 9^, y = 41ogn + n^„x^(^r„,^e„), by the 
first inequality of Proposition 7.3 one gets 

IC(9n,r„ , 9n) = Opp f ^ + {Fr^ , Poj) as n ^ OO. 

To finish the proof we use the fact that by Lemma 8.3 it holds rim ^ 7ir„ 
as n — > oo, whenever lim^_>oo = oo. □ 

The rate of convergence involved in the previous theorems de- 

pends on the unknown d.f. F and on the unknown location r„. The best 
possible rate of convergence for a given F is obtained by choosing t„ from 
the balance equation (4.2). Explicit calculation of the resulting rates of con- 
vergence for some d.f.'s F are given in Section 5. 

4.2. Stability property of the test statistic. In the sequel it is assumed 
that 3„ = /ulogn, where /i > is a constant. We say that the location 
t is accepted by the testing procedure if Xn,r > t implies T^^r < 3n- Set 
{t is accepted} = = r\x„,r>t{Fn,r < in}- 

Theorem 4.6. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function 9t, t>xo, and {t„} is an admissible sequence 
of locations. Then there exists a finite positive constant fi such that 

PpiTn is accepted) = PF{^n,T„) ^1 as n ^ oo. 

Proof. First note that by Proposition 7.5 

Jr" F sup Tn{t,T)>z) <2n'^exp(-y/2) + -<-, 
\t„<s<t J n n 
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where y = 16 log n and z = 2y + 2?1t-^ (i<V„ , -Pe^n ) • the other hand, by 
(4.2) z < In, for some constant n and n sufficiently large. Consequently 
^n,T„ ^ {sup^„<t<r Tn{t, t) > z}. This implies lim„^oo PF(J^n,r„) = 0- n 

Remark 4.7. Prom the preceding proof it can be easily seen that the 
constant /i in the definition of the critical value depends only on the 
constant involved in the definition of O in (4.2), say A. A simple tracking of 
constants shows that a crude upper bound for is 32 + 2Xe^. 

4.3. Rates of convergence of the adaptive estimator. First we compare 
the performance of the adaptive estimator 0„ with that of the nonadaptive 
estimator 9n,Tn- 

Theorem 4.8. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function 6t, t > xq, and {t„} is an admissible sequence 
of locations. Then there exists a constant fi> such that 

JC{enX,rJ = Op, ( ^,!_°^/(^^^) ) - - - oo. 

Proof. Let n*^^^^ = f]n,r„ n {T^^ko < in}- Since by Theorem 4.6 
PF{Qn,T„) ^ 1 as n ^ OO and by Lemma 8.4 Pp{Xn,kQ ^ Tn) ^ 1 as n ^ oo, 
it holds 

(4.10) PF{ni^,J>PF{nn,r„n{Xn,ko>rn})^l asn^oo. 

Denote m„ = m„, — 1 . By the definition of m„ on the set fl* it holds 
fnn > n-r^ (see Section 3.2). We split the further proof into two parts. 

Pirst we shall compare h^ and h^ . To this end define the sequence 
of natural numbers mj, i = 0, 1, . . . , i*, such that mo = fhn and is the 
smallest natural number exceeding mj_i/2 for i = 1,2, . . . ,i* , where i* such 
that pmi* < nr„ < (1 — 5)mi*. Let m,j*+i = n^-^. Since, on the set 0* 

(4.11) T'n.fc <3n = ^logn for k£TZn,k< fhn, 
by (3.3), with s = Xn,mi-i <t = Xn,mi, one gets 

'miK,(hn,m,^i,hn,m,) < Tn,mi_i,m, < /ilogn, i = 1, . . . , i* + 1, 

which in turn implies 



^]/}C{hn,ni,^,,hn,m,) < Ai^^^ log^/^n^m^ 



-1/2 

i=l i=l 

Taking into account that > mj„i/2, for i = 1, . . . ,i* , we obtain 

1=1 i=l 
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Since nr„ < Trii*, by Lemma 8.2, on the set $7* it holds 



(4.12) 

^ 3-4.5 i/alogi/'n 



-1/2 



Now we shall compare h ~ and /i r . Recall that by the definition kr, is 

a natural number satisfying pfhn < kn < (1 — S)mn < fhn (see Section 3.2). 
Then, on the set ^2* (4.11) implies ~ ^ <3n- Since on the same set 
it holds fhn > nr„ , we get 



n 



,1/2 

"In' 



Summing (4.12) and (4.13), by Lemma 8.2 it follows that on the set $7* 



(4-14) y^(\,^„>^n.J<M^/^^/2 ' 

where c is an absolute constant. Taking into account (4.10), 

/ ^ ^ log Tl \ 

PF[K^{0n,0n,T„) <cfj,— — 1^1 asn^oo. 

To get the requested assertion it suffices to replace the random rate of con- 
vergence nr„ with the deterministic rate nr„ = n{l — F{Tn)) by Lemma 8.3. 
□ 

Combining Theorem 4.8 with Theorem 4.1 one gets the following asser- 
tion: 

Theorem 4.9. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function 6t, t > xq, and {t„} is an admissible sequence 
of locations. Then there exists a constant fi> such that 

^dn,0rJ=Opp( .J asn^oo. 

\n{l - F{Tn))J 



In particular if condition (4.1) is fulfilled with 6t = Ot{F) one gets 

logn 

n{l-F{Tn)), 



(4.15) /C(0„,^,jF)) = OpJ — i^lJ^) asn^oo. 
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Another case of interest is when F is regularly varying with index of 
regular variation 7 > 0. Assume that condition (4.1) is satisfied with 9t = 7, 
t>XQ. Then 

(4.16) lC{en,l)=OvJ—r^^^^r-^\ asn^oo. 



n(l - F(r„)) 

Now we are in position to formulate the result concerning the approxima- 
tion of the excess d.f. F^-^. 

Theorem 4.10. Assume that the d.f. F admits an accompanying Pareto 
tail with tail index function Ot = Ot{F), t > xq, and {t^} is an admissible 
sequence of locations. Then there exists a constant /_f > such that 

Proof. The proof is similar to that of Theorem 4.4. The only changes 
are that On replaces 0n,r„ and that one uses (4.15) instead of (4.5). □ 

5. Computation of the rates of convergence. In this section we shall 
compute explicitly optimal rates of convergence in two particular cases. 

Introduce the distance /3*(x, y) = max{| log ||, | ^ — ^|}, x, y > 0. From Propo- 
sition 8.6 it follows that the sequence {t„} is admissible if there exists a 
function t — > 0j such that 

(5-1) pI^= ^^^V P*{(^F{x),erS = 0[— E^TT^l^O asn^oo, 

X>Tn 

roo 



Xl-F{Tn)), 

POO 

(5.2) sup / (l + logx)V«F,„(d2;) = 0(l) 

m>n J 1 



as n ^ 00. 



In turn this implies that the conclusions of Section 4 hold true. The optimal 
rate corresponds to minimal location r„ satisfying (5.1). 

5.1. Perturbed Pareto model. Assume that F has the form 

(5.3) F(x) = 1 — c^a;~"^/^loga;, x>xo>e, 

where /9 > /3o > 0, xq and cp are chosen such that F(x) is strictly mono- 
tone and F{xo) = 0. By straightforward calculations 9t{F) = /3(1 -|- j^) and 

apix) = /3(1 - Since < ^ and 1 - F{t) = cpt~^/^logt, for 

determining nearly optimal location we get the balance condition 



1 2 -1//3, 
iog r„ \ncpTn logr„ 
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With the optimal choice ^ ^J^0 _^ , one gets ^^-^^|,"^ = 0(log ^ n) as n ^ 
cxD. On the other hand condition (5.2) is satisfied since, by (2.3), /i?^^(x) < 
^x~^^~^^^^~^ , for some e G (0, 1). Then according to the results in Section 4, 



IC{er^{F),en,rJ and lC{9r,AF),0n) are Op^(log-^n) as n ^ cx). Taking 9, 



t — 

P, in the same way one shows that /C(/?, On,T„) and /C(/3, ^n) are Op^ (log~^ n) 
as n — > oo. According to the results in Drees [6], Theorem 2.1, the best 
achievable rate for estimating (3 in norm is j;^, in a certain class of 

iOg 71 

d.f.'s which includes the d.f. F satisfying (5.3) (we refer to Drees [6] for 
details). Our estimators 0n,r„ and 9n attain the same rate. 

For the log-gamma d.f. we obtain the same rate of convergence since it 
has essentially the same behavior as the d.f. F defined by (5.3). 

5.2. Hall model. Assume that F is of the form 

(5.4) F(x) = 1-c/3X-^/^-Ct,x-^/'^, x>xo, 

where /3 = 7 + a > /3o > 0, a, 7 > and xq, cp and are such that F{x) 
is increasing on [xo,oo). Also, though it is not exactly the model proposed 
by Hall, we shall call it the Hall model. By straightforward calculations 

^*(^) = 4t-i//^+c,t-i/7 and apix) = It IS easy to 

check that = 0{Tn^'^^^"^) as n ^ 00. Since l-F(t) = cpt"'^/'^ + c^t-'^l^ , 
for determining the nearly optimal location r„ we get the balance condition 

logn 



2/7+2//3 ^ Q 



The optimal choice r„ x (^)/37/(2/3-7), implies ^^^j^^f^ = 0((i^|^)2°/(/5+°)) 
as n — > cx). As in the previous example one can show that (5.2) is satisfied. 
Then according to the results in Section 4, /C(0t-„ {F),Gn,T„) and IC{9r„ {F),9n) 
are Op^ ((^)2"/(^+°)) as n ^ 00. Taking 9t = (3we have that lC{(5,9n,r^) 
and /C(/?,^„) are Op^ ((^)2"/(^+")) as 00. By the results in Hall and 
Welsh [12] (see also Drees [6], Theorem 2.1, for a more general result), the 
optimal rate of convergence that can be achieved for estimating a = 1//3 in 
L? norm is nP/^'^P^^\ in the class of d.f. F having the density (3.1). The 
d.f. F defined by (5.4) satisfies this condition with 7 = /3(1 + p)~^. Since 
= ^2^1^^ = 2^^' the estimators ^n,r„ and attain this rate for (3 up to 
an additional log"/('^+") n factor. 

6. Numerical results. 



6.1. Choice of the parameters of the adaptive procedure. An important 
parameter in the proposed adaptive procedure is the sequence of critical 
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values ^n- According to Remark 3.1 the test statistic does not depend on 
the parameter of the Pareto law if the observations follow a Pareto model. 
Therefore we propose to compute the critical values 3„ by Monte Carlo simu- 
lations from the homogeneous model with i.i.d. standard Pareto observations 
Xi, i = l,...,n. 

We simulated 2000 realizations with three different sample sizes n = 200, 
500, 1000 and with the grid length Kn set to 200. The size of testing windows 
in Tn^m is determined hy p = 1/4 and 6 = 1/20. The empirical d.f. of the 
statistic T„ = maXm=rl,...,rK^. Tn,m has been computed and it was found that 
in all simulations the critical value 3^ = 10 corresponds to a 99% confidence 
level. The same critical value 3„ = 10, corresponding to a 99% confidence 
level, has been found from 2000 realizations with n = 1000, p = 1/4, 5 = 1/20 
and with different grid lengths Kn = 100, 200, 300. Additional simulations 
show that finite sample properties of the test statistic r„ do depend very 
little on the parameters fco and 6. The value 3„ = 10 which approximately 
corresponds to a 99% confidence level in all cases and the grid length Kn = 
200 have been retained. 

Further simulations show that the finite sample performance of the adap- 
tive estimator depends mainly on the parameter p which plays the same role 
as the bandwidth in the nonparametric kernel density estimation. The choice 
of p, in turn, depends on the class of functions in hands. In the simulations 
below we fix the following values 6 = 1/20, fco = n/20, Kn = 200, 3„ = 10. As 
to the value of p it will be fixed to 1/4. This choice is motivated by the desire 
to minimize the relative mean squared error for some given heavy-tailed laws. 
In the simulations below we shall consider the following distributions: (1) 
The positive part of Cauchy d.f. F{x) = -arctana;,x > 0. (2) Log-gamma 
d.f. F{x) = Gi^2(}ogx), X > 1, where GA,a(x), x > 0, is gamma d.f. with 
parameters A,a > 0. (3) Log perturbed Pareto d.f. F{x) = 1 — x^^logx, 
X > xq = e. (4) Hall's model F{x) = 1 — 2x~^ + x > xq > 0, where xq 

satisfies 2xq^ - Xq = 1. (5) GPD F(x) = 1-{1+ x)~\ x > 0. 

6.2. Estimation of extreme quantiles. We shall demonstrate the perfor- 
mance of the adaptive estimator 9n = by presenting the results of a 
simulation study for estimating extreme quantiles. We consider two oppo- 
site cases: observations from d.f.'s whose tails are close to a Pareto model 
in the range of big order statistics (such as Cauchy d.f., GPD, some of the 
Hall models) and observations from d.f.'s whose tails are not well approxi- 
mated by a Pareto model in the range of the large order statistics at least 
for samples of reasonable size (such as log-gamma d.f. and log perturbed 
Pareto d.f.). For many d.f.'s our simulations show a behavior in-between the 
latter two types. We performed 2000 Monte Carlo simulations of n = 1000 
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observations. The quantiles of F are estimated by solving for x>Tn in the 
following approximation formula: 



.Tn) \TnJ \Tn J 1 " 1 - F(r„) 

If X < r„ we determine x from the equality p = F{x). The unknown location 
parameter t„ has to be replaced with the adaptive value = and F 

with the empirical d.f. F„, which leads to the following adaptive estimate of 
the quantiles of F: 

Xn,[n{l-p)]-, ifp<l , 

X — — - — - I , otherwise, 
Here and in the sequel qn,k,p denotes the quantile estimator 



k 



2n,k,p 



Xn,k [ ) ' otherwise, 



(0,l),fc = 2,...,n, 



(6.2) 

which combines the sample quantile estimator for low quantiles and the 
estimator introduced by Weissman [21] for high quantiles. 

6.2.1. The performance of the adaptive estimator. For any estimator 3 
of a let 

1 S 

cr^(a,a) = - Vlog^- 

n a 
1=1 

be the relative mean squared error (RelMSE) of S. We compare cr{qn,p,qp) 
with cr{qn^k,p, qp)- Figures 4 and 5 plot these quantities forp=l — l/n = 0.999 
and p = 0.9999999 as a function of k. It is useful to compare RelMSE 
cr{qn,p,qp) with minimal RelMSE mm/: cr{qn^k,p,qp) as a function of p (see 
Figure 6). The ratio r„^p = o-{qn^p,qp)/ min^ (T(g„^fc^p, g^) regarded as a func- 
tion of p is plotted in Figure 7 (see also Table 1 for a more precise evalu- 
ation). These simulations show that the proposed adaptive procedure cap- 
tures nearly the best choice in k which depends on the unknown d.f. F. 
The procedure gives reasonable results in both cases, for d.f. with Pareto 
like tails as well as with d.f. which exhibits large perturbations from these 
tails. Table 1 hints that the increase of RelMSE introduced by the adaptive 
procedure for estimating high quantiles qp with p G [0.9, 0.9999999999] does 
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not exceed 7%. For log-perturbed Pareto d.f. the results are very similar to 
those of log-gamma d.f. and therefore will not be presented here. 

We would like to point out that for GPD the ratio r^^p is even less than 
1, which means that the adaptive quantile estimator g„.p improves the per- 
formance of individual quantile estimators qn,k,py k = 2, . . . ,n. This improve- 
ment can be observed for other d.f. with an appropriate choice of the pa- 
rameter p. The corresponding plots of the ratio r„^p for log-gamma d.f. with 
p= 1/10 and for Hall model with p = 1/2 are given in Figure 8. 

The high variability for extreme quantiles qp, p > 1 — O.l/n (see Figure 
6) is mainly explained by the bias introduced by the Pareto model and less 
by the variability introduced by the adaptive procedure. The bias reducing 
techniques can be applied under some additional assumptions on the under- 
lying d.f. F. Our adaptive values kn and 9n can be applied with these types 
of bias reduced estimators to construct new adaptive quantile estimators, 
however this issue will not be discussed here. For further details on this 
subject we refer to Danielsson et al. [3], Gomes and Oliveira [8]; see also 
Chapter 4.7 in Beirlant et al. [1]. 

6.2.2. Comparison with sample quantiles. For any k = 1, . . . ,n the sam- 
ple quantile X„ ^ is considered as an estimate of the true quantile 
where Pn^k = 1 — k/n. We shall compare the RelMSE of adaptive quantiles 
Qn,p„ k, with those of sample quantiles for A; = 1, ... , 500 by computing 

the ratio r^ ,. = a{Xn,k,qp„^k)/'^i9n,p„,k^Qpn,k)- The results of the simulations 
are reported in Table 2 and Figure 9. They show that there is a substantial 
gain in variance if we use (6.1) for estimating large quantiles. 

Figures reported in Tables 1 and 2 can be used to compare the perfor- 
mance of the adaptive estimator 9n with other adaptive estimators. 

Table 1 







Values 


of r„,p 






p 


0.9 


0.99 


0.999 


0.9999 


0.99999 


Cauchy 


1.017966 


1.023952 


1.041944 


1.049905 


1.054291 


log-gamma 


1.042706 


1.002527 


1.002542 


1.013393 


1.021253 


Hall model 


0.996002 


1.009698 


1.023196 


1.030144 


1.034276 


GPD 


1.094321 


0.998349 


0.989391 


0.985767 


0.984071 


P 


0.999999 


0.9999999 0.99999999 


0.999999999 


0.9999999999 


Cauchy 


1.057159 


1.059174 


1.060642 


1.061758 


1.062635 


log-gamma 


1.026952 


1.031355 


1.03472 


1.037275 


1.039637 


Hall model 


1.036994 


1.038913 


1.040339 


1.041438 


1.042312 


GPD 


0.983118 


0.982513 


0.982184 


0.981981 


0.981829 
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Fig. 4. Qh — a{qn,k,p,qp) for k = 1, . . . ,n and Qad = a{qn,p,qp) ; R = rn,p. Top: Cauchy 
observations; Bottom: log-gamma observations. 

6.3. Estimation of the index of regular variation. According to (4.16) 
the adaptive estimator 9n converges to the index of regular variation 7. The 
performance of w.r.t. 7 will be measured using the root 



Table 2 
Values of r^ j. 



k 


1 


2 


3 


4 


5 


10 


20 


Cauchy 


3.5360 


2.4100 


2.0294 


1.8671 


1.7200 


1.4226 


1.2621 


log-gamma 


2.7270 


1.9417 


1.7306 


1.5924 


1.4971 


1.3010 


1.2453 


Hall model 


4.1240 


2.7809 


2.3237 


2.1246 


1.9466 


1.5772 


1.3605 


GPD 


1.3117 


1.2108 


2.9563 


2.0609 


1.7629 


1.6422 


1.5288 


k 


30 


40 


50 


60 


70 


80 


90 


Cauchy 


1.2081 


1.1852 


1.1849 


1.1755 


1.1928 


1.1860 


1.1745 


log-gamma 


1.1982 


1.1724 


1.1611 


1.1696 


1.1642 


1.1622 


1.1748 


Hall model 


1.2758 


1.2324 


1.2183 


1.2001 


1.2141 


1.2088 


1.2040 


GPD 


1.1813 


1.1683 


1.1675 


1.1509 


1.1557 


1.1327 


1.1033 
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^{Qn.k,pjqp) for k=l,. . . ,n 
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vations from Hall's model; Bottom: GPD observations. 

mean squared error (RMSE) a{6n) = E^^'^{On — 7)^- The corresponding sim- 
ulations of the RMSE's a{9n) and a{hn,k) (as a function of k) are presented 
in Figure 10. In case of Cauchy d.f. the minimal value of RMSE of the Hill 
estimator is min^ =0.07385, while the RMSE of the adaptive esti- 

mator is a{9n) = 0.07899, which gives the ratio = (j{On)/ ramkcr{hn,k) = 





Fig. 6. mirik a (qn^k.p,qp) (points) and a(q„.p,qp) (solid line) as functions of p. 




Fig. 7. The ratio rn,p as a function of p. 




Fig. 8. The ratio r„,p as a function of p. Adaptive procedure is performed with p—1/2 
(left) and p = 1/10 (right). 



1.06966. For log-gamma d.f. the minimal value of RMSE of the Hill estima- 
tor is minfc(T(/i„ fc) = 0.23112, while the RMSE of the adaptive estimator is 

a(9n) = 0.24804, which gives the ratio = a{0n)/ nimka(hn,k) = 1.07321. 
Thus for Cauchy and log-gamma the adaptive estimator increases the min- 
imal variance in the family of Hill estimators by 7.4%. 
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Fig. 9. Left: o{Xn,k,(lp^ (points) and o-{qn,p,^ k'ipn k) (solid line) as functions of k; 
Right: r|J j, — a{Xn,k,(lp„ k)/'^iln-,Pn k'lpn k) function of k; Top: Hall model; Bottom: 
log-gamma d.f. 



7. Proofs of the exponential bounds. Let t > xq. The local log-likelihood 
ratio Ln,t{H,G) = Ln,t{H) — Ln,t{G) admits the representation 

L„,(H, G) = Y. ^ + / (4l - ^) -■ 

i-^>t J{t,x,]\aG{u) aH[u)J u 

Recall the following notations: rit = n(l — F{t)), nt = J27=i > t) a-iid 
nt,r = E«=i l(t < X, < r) for t>xo. 

We start with a bound for the log of the local likelihood ratio. 

Proposition 7.1. Let s > xq. For any F,G,H £ (F any y > it holds, 

(7.1) PpiLnAH, G)>y)< exp(^-| + y^) , 

where dg = x^{Fs,Gs). 

Proof. By exponential Chebyshev's inequality, 

VpiLnAH. G)>y)< exp(-2//2 + logE^(exp(L„,,(/?, G)/2))). 
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Fig. 10. Estimation of the index of regular variation 7; Qh = a{hn.k) — plot of the es- 
timated RMSE of the Hill estimator hn,k, for k — 1, . . . ,n; Qad = a{6n) — RMSE of the 
adaptive estimator 6„ w.r.t. 7. 



Since the r.v.'s Xi, . . . ,Xn are i.i.d., one gets 

log Ep (e^p(lLn,s{H, G)))=n log f exp log "^^^ ^ 



2 ' 'J J ° H "V 2 ° dG, 

where Ai^s = {Xi > s}. Since Eirexp(lA,,, log ^(4^)) = 1, by Holder's in- 
equality 

Using 



E,.exp(u,,, log g (^^)) = F{s) + (1 - F{s)){ds + 1), 



one gets 



PpiLnAH, G) > y) < exp ( -| + ^ log{l + (1 - F(s))4} 

I y IT'S , 

<exp( -2+Y 
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□ 

Denote for brevity Ln,tA^' ,9) = Ln,t{0' ,0 ,T,e). 
Corollary 7.2. For any t >t> s > xq and 9', 9 >0, 

(7.2) PpiLnAG', 0)>y)< exp(-| + yds) , 

(7.3) PF{Ln,tAO', 9)>y)< exp(^-| + , 
where 4 = x^{Fs,Pe)- 

Proof. The first assertion follows from (7.1) when applied with H and 
G such that anix) = ac{x) = 6* for x G [xo,t) and anix) = 9', adx) = 9 
for X E [t,oo). The second one is obtained with anix) = adx) = 9 ior x £ 
[xq, t) U [r, 00) and anix) = 9', adx) = 9 iov x £ [t, r). □ 

Proposition 7.3. For any F G J", 9>0, y > 0, r > t > s > xq it holds 

PFinM9n,t, 9)>y)< 2nexp (^-| + , 

FF{nt,TlC{9n,t,T,9) > y) < 2nexpf - - + y4 j, 
w/iere 4 = x^(-^s,-P0)- 

Proof. We shall prove only the second inequality the first one being 
proved in the same way. 

First note that nt^TK-i9n,t,9) = Ln,t,T{9n,t,T,9). For the sake of brevity let 
lk{a) = {y/k — log £)/(^ — Oi> 9. Since the function lk{a) is continuous 
in a for a > and lim^^e ^fc(a) = 00, lima^oo ^fc(a) = 00, there exists a finite 
point a\> 9 which realize = argmina>6i /^(a). Note that is a function 
only on fe, y, 9. With these notation, on the event = {9n,t,T > 9, nt^r = k}, 
we have that the inequality 

Ln,t,T{9n,t,T, 9) = nt,T (log -J- (l/^n,t,T " l/6')^„,,f > y 

\ 9n,t,T ' 

is equivalent to 9n,t,T > hi0n,t,T) and the inequality 9n,t,T > hio^l) is equiv- 
alent to Ln,t,T{cel,9) > y. Then 

{Ln,t,T{9n,t,Ti 9)>y}r\ Ak = {9n,t,T > lk{On,t,r)} ^ 

'^{9n,t,r>lk{al)]r\Ak 
<Z{Ln,tA(^l9)>y]. 
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In the same way 

{Ln,tAKt,r, 0) > y] nBkQ {Ln,t,Ml\e) > y}, 

where = {On,t,T < nt^r = k} and a" = argmaxo<a<0 ^fc(a) is a function 
only on k,y,9. The latter implies 

PF{Ln,t,Ti0n,t,T,d) > y, nt,r = k) 

< VF{Ln,t,Ml,e) >y)+ Pp{Ln,tA»*k^S) > y). 
Since by Corollary 7.2, 

PF{Ln,tA0', e)>x)< exp 1 + , 

with 9' = al, al* , one gets 

n 

^F{Ln,t,T{On,t,T, 0)>x) = ^ PF{Ln,t,T{0n,t,T, 0) > X, Ht^r = k) 

k=l 

f y ng .\ 
< 2nexpl -- + —ds I, 

which completes the proof. □ 

Proposition 7.4. For any F e T and s > xq, 9 > 0, y > it holds 

PF[supHtlC{en,t,e)> y^ <2n^exp(^-| + ^4^ 

Pf( sup nt^rK,{9n,t,T,0) > y ) < n^expf - ^ + ^dA + - 
\s<t<T J \ 2 2 J n 

where ds = x^{Fs,Pe)- 

Proof. We shall give a proof only for the second inequality, the first 
one being proved in the same way. 

Let N = and J = {sq, . . . ,sn} be the set of numbers satisfying s,j_i < 
Si, F{[si-i, Si)) = 1/N and {jf=i[si~i,Si) = [xo,c>o). If we denote by 2l„ the 
event that Xn^i, . . . ,Xn^n will fall into disjoint intervals, then, for Kn > 2, 

pH«.)=n(i-^)>i-|:io<i-^) 

i — 1 ^ 3n(n — 1) ^ ^ 1 

i=2 

On the event 21^ it holds 



2 jsi Arv- ~ n' 



sup nt^rlC{On,t,T,0) = max nt^rK,{6n.t,T,0). 

S<t<T S<t<T, t,TeJ 
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Then 



Pf{ sup nt,rlCi6n,t,T,0) > y 

\s<t<T 



(7.4) 

< Pf( V^(0n,i,r, d)>y) + l- PF(2ln). 



S<t<T, t,T&J 



According to Proposition 7.3 



^F{Ln,tA^n,t,r,0) > v) < 2n eyip{-ys) , 



where = | - §(1 - F{s))ds. Since Es<t<r, t,reJ < ^V2, from (7.4) one 



We end this section with an exponential bound for the statistic r„(t, r). 
Proposition 7.5. For any F £ and s>xq, 9 >0, y>0 it holds 



where ds = x^{Fs,Pe)- 

Proof. Let 6* > 0. Using (3.2) and the inequahty sup^gjp^ Ln,f(F) > 
L„,t(6'),one getsr„(t,r) <Ln,s{0 

The representation (2.8) implies Tn{t,T) < nt,TK,{6n,t,T,(^) + nTK,{9n,T-,G)- The 
assertion of the lemma follows from Proposition 7.4. □ 

8. Auxiliary statements. 

Lemma 8.1. For any 61,62 >0 such that lC{6i,92) <\ it holds 



gets 




□ 





and for any 61,62 > such that log^ ^ < |, it holds 



(8.2) }C{9„62)<llog'^f. 
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Proof. Note that ^log'^{x + 1) < G{x) for any x satisfying G{x) < 1/2 
and G{x) < | log^(x + 1) for any x satisfying log^(x + 1) < |. The assertion 
of the lemma follows directly from these inequalities. □ 



Lemma 8.2. For any sequence of positive numbers 9i,...,6][i such that 

M-l 

J2 ^mA+i)<\ 



1=1 



it holds 
(8.3) 



i=l 



Proof. To prove (8.3) note that by (8.1), 



1 ^1 
logT- 

Then using (8.2), 



<E 

1=1 



log- 



A/-1 



i=l 



1 '^1 



which in conjunction with (8.1) proves (8.3). □ 

Lemma 8.3. // the sequence Tn > xq, n = 1, 2, . . . , is such that — > oo 
as n ^ oo, then x n^-^ as oo. 

Proof. By Chebyshev's exponential inequality, for any n > and e G 
(0,1), 

PF{nr„/nr„ < 1 - e) < exp(u(l - e)n^„ + ?ir„(e"" - 1)) 

< exp{-uenr„ + u^nr„). 

In the same way (nm /f^T„ > 1 + e) < exp{—uenr„ + u^Ut^). Choosing 
u = e/2 one gets 



1 



>e] <2exp(-— 



Since n^-^^ oo one gets the first assertion. □ 

Lemma 8.4. For any sequence Tn, n = 1,2, . . . , satisfying n^-^ oo as 
n oo, it holds lim„^oo PF(-'^ri,A; > Tn) = 1, for any given natural number 
k. 
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Proof. By Lemma 8.3 Ppinm > k) ^ 1 as n ^ oo. Since FF{Xn,k > 
Tn) = FpinTn > k) we obtain the assertion of the lemma. □ 



Lemma 8.5. Assume that P^Q. Then 
X^(P,Q)<Ep(log2gexp 



log 



dQ 
dP 



Proof. It is easy to see that x^{P,Q) = j g{^) dP, where g{x) = 
Since {x - 1)^ < e^^QS^ log^ x, for x > 1 and (x - 1)^ < log^ x, for x € (0, 1) 
we get g{x) < log^xexp(| logx|), for x > 0. □ 



Proposition 8.6. Assume that d.f.'s F and G are such that it holds 
Pt = sup3,>j /3*(aj?(x), aG'(x)) < Eq and J^{1 + Iogx)^x^oFt((ix) < ei. Then 
X^{Ft,Gt) <C{so,si)p'^, where C(eo, ei) = eie'^o. 



Proof. Since 

dFtix) acixt) , 1 I \du 
log 377^= log r-A+/ TT TT — ' ^ ^ 1' 



dGt{x) apixt) Jt xaciu) apiu) J u 

it holds I log 'dcit{x) I — logx). Using Lemma 8.5, with P = Gt and 

Q = Gt one gets x^{Ft,Gt) < pieP* + logx)2x''*Ft((ix). This implies 

the assertion of the proposition. □ 
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